Modified Shine-Dalgarno sequences and methods of uses thereof

ABSTRACT

Novel Shine-Dalgarno (ribosome binding site) sequences, vectors containing such sequences, and host cells transformed with these vectors are provided. Methods of use of such sequences, vectors, and host cells for the efficient production of proteins and fragments thereof in prokaryotic systems are also provided. In particular embodiments of the invention, compounds and methods for high efficiency production of soluble protein in prokaryotic systems are provided.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a divisional of U.S. application Ser. No. 11/004,853, filed Dec. 7, 2004, which is a continuation of International Application Serial No. PCT/US03/19786, filed Jun. 25, 2003, which claims benefit under 35 U.S.C. 119(e) to U.S. Provisional Application Ser. Nos. 60/391,433, filed Jun. 26, 2002, and 60/406,630, filed Aug. 29, 2002, each of which is hereby incorporated by reference in its entirety.

FIELD OF THE INVENTION

The present invention relates to novel Shine-Dalgarno (ribosome binding site) sequences, vectors containing such sequences, and host cells transformed with these vectors. The present invention also relates to methods of use of such sequences, vectors, and host cells for the efficient production of proteins and fragments thereof in prokaryotic systems, and in one aspect of the invention, provides for high efficiency production of soluble protein in prokaryotic systems.

BACKGROUND OF THE INVENTION

The level of production of a protein in a host cell is determined by three major factors: the number of copies of its structural gene within the cell, the efficiency with which the structural gene copies are transcribed and the efficiency with which the resulting messenger RNA (“mRNA”) is translated. The transcription and translation efficiencies are, in turn, dependent on nucleotide sequences that are normally situated ahead of the desired structural genes or the translated sequence. These nucleotide sequences, also known as expression control sequences, define, inter alia, the locations at which RNA polymerase binds (the promoter sequence to initiate transcription; see also EMBO J. 5:2995-3000 (1986)) and at which ribosomes bind and interact with the mRNA (the product of transcription) to initiate translation.

In most prokaryotes, the purine-rich ribosome binding site known as the Shine-Dalgarno (S-D) sequence assists with the binding and positioning of the 30S ribosome component relative to the start codon on the mRNA through interaction with a pyrimidine-rich region of the 16S ribosomal RNA. See, e.g., Shine & Dalgarno, Proc. Natl. Acad. Sci. USA 71:134246 (1976). The S-D sequence is located on the mRNA downstream from the start of transcription and upstream from the start of translation, typically from 4-14 nucleotides upstream of the start codon, and more typically from 8-10 nucleotides upstream of the start codon. Because of the role of the S-D sequence in translation, there is a direct relationship between the efficiency of translation and the efficiency (or strength) of the S-D sequence.

Not all S-D sequences have the same efficiency, however. Accordingly, prior attempts have been made to increase the efficiency of ribosomal binding, positioning, and translation by, inter alia, changing the distance between the S-D sequence and the start codon, changing the composition of the space between the S-D sequence and the start codon, modifying an existing S-D sequence, using a heterologous S-D sequence, and manipulating of the secondary structure of mRNA during the initiation of translation. Despite these changes, however, success in increasing of protein expression efficiency in prokaryotic systems has remained an elusive and unpredictable goal due to a variety of factors, including, inter alia, the host cells used, the expression control sequences (including the S-D sequence) used, and the characteristics of the gene and protein being expressed. See, e.g., Stenstrom, et al., Gene 273(2):259-265 (2001); Komarova, et al., Bioorg. Khim. 27(4)282-290 (2001); Stenstrom, et al., Gene 263(1-2):273-284 (2001); and Mironova, et al., Microbiol. Res. 154(1):35-41 (1999). For example, efficient expression of soluble B. anthracis protective antigen (PA) has proved difficult in E. coli. See, e.g., Sharma, et al. Protein Expression and Purification 7:33-38 (1996) (indicating 0.5 mg/L at 70% purity); Chauhan, et al. Biochem. Biophys. Res. Commun.; 283(2):308-15 (2001) (indicating 125 mg/L); Gupta, et al. Protein Expr. Purif. 16(3):369-76 (1999) (indicating 2 mg/L).

Accordingly, there remains a demand in the art for compositions and methods for increasing the efficiency of ribosome binding and translation in prokaryotic systems, thereby resulting in increased efficiency of protein expression. This demand is especially strong for proteins that are difficult to express in existing systems, and for proteins that are desired in large quantity for pharmacological, therapeutic, or industrial use.

SUMMARY OF THE INVENTION

The present invention encompasses novel Shine-Dalgarno sequences that result in increased efficiency of protein expression in prokaryotic systems. The present invention further relates to vectors comprising such S-D sequences and host cells transformed with such vectors. In particular embodiments, the present invention relates to methods for producing proteins and fragments thereof in prokaryotic systems using such S-D sequences, vectors, and host cells. In certain embodiments, methods of use of the S-D sequences, vectors, and host cells of the invention provide high efficiency production of soluble protein in prokaryotic systems, including prokaryotic in vitro translation systems.

In particular embodiments of the invention, the novel S-D sequence comprises (or alternately consists of) SEQ ID NO:2. In additional embodiments, the novel S-D sequence comprises (or alternately consists of) nucleotides 4-13 of SEQ ID NO:2. The invention also encompasses the S-D sequence of SEQ ID NO:18, described at paragraph 0426 of U.S. Provisional Application No. 60/368,548, filed Apr. 1, 2002, and in U.S. Provisional Application No. 60/331,478, filed Nov. 16, 2001, each of which is hereby incorporated by reference herein in its entirety.

The protein or fragment thereof may be of prokaryotic, eukaryotic, or viral origin, or may be artificial. In particular embodiments, the S-D sequences, vectors, and host cells of the invention are used to express B. anthracis protective antigen (PA), mutated protective antigens (mPAs) (See, e.g., Sellman et al, JBC 276(11):8371-8376 (2001)), TL3, TL6, or other proteins. In certain embodiments, the S-D sequences, vectors, and host cells of the invention are used to express proteins that have previously been difficult to express in prokaryotic systems. The present invention also encompasses the combination of novel S-D sequences with a variety of expression control sequences, such as those described in detail in U.S. Pat. No. 6,194,168 (which is hereby incorporated by reference herein in its entirety), and in particular, expression control sequences comprising at least a portion of one or more lac operator sequences and a phage promoter comprising a −30 region.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a Shine-Dalgarno sequence of the present invention (SEQ ID NO: 2) and the Shine-Dalgarno sequence contained in the pHE4 expression vector (SEQ ID NO:17) (See U.S. Pat. No. 6,194,168). Bases matching the S-D sequence of the present invention (SEQ ID NO:2) are highlighted.

FIG. 2A depicts a map of the pHE6 vector (SEQ ID NO:1), which incorporates a S-D sequence of the invention. FIG. 2B depicts the pHE6 vector (SEQ ID NO:1) with the gene encoding mature Bacillus anthracis PA including an ETB signal sequence (SEQ ID NO:3) inserted.

FIGS. 3A-3B compare the efficiency of TL6 protein expression using the pHE4 vector (FIG. 3B) versus the pHE6 vector (FIG. 3A), which uses a S-D sequence of the invention. In particular, increased soluble TL6 expression with the pHE6 vector can be seen in FIG. 3A as a lack of “shadow” in the gel.

FIG. 4 depicts a gel showing the quantity and quality of PA after expression using pHE6 and subsequent purification. Using the compositions and methods of the invention, approximately 150 mg/L of soluble PA at greater than 96% purity (as measured by RP-HPLC) was obtained.

DETAILED DESCRIPTION OF THE INVENTION

The instant invention is directed to novel Shine-Dalgarno (ribosomal binding site) sequences. These S-D sequences result in increased efficiency of protein expression in prokaryotic systems. The S-D sequences of the present invention have been optimized through modification of several nucleotides. See, e.g., FIG. 1. In particular embodiments, the S-D sequences of the present invention comprise (or alternately consist of) SEQ ID NO:2. In additional embodiments, the S-D sequences of the present invention comprise (or alternately consist of) nucleotides 4-13 of SEQ ID NO:2. In other embodiments, the S-D sequences of the present invention comprise (or alternately consist of) SEQ ID NO:18.

In many embodiments, the S-D sequences of the present invention are used in prokaryotic cells. Exemplary bacterial cells suitable for use with the instant invention include E. coli, B. subtilis, S. aureus, S. typhimurium, and other bacteria used in the art. In other embodiments, the S-D sequences of the present invention are used in prokaryotic in vitro transcription systems.

The present invention also relates to vectors and plasmids comprising one or more S-D sequences of the invention. Such vectors and plasmids generally also further comprise one or more restriction enzyme sites downstream of the S-D sequence for cloning and expression of a gene or polynucleotide of interest.

In certain embodiments, vectors and plasmids of the present invention further comprise additional expression control sequences, including but not limited to those described in U.S. Pat. No. 6,194,168, and in particular, M (SEQ ID NO:5), M+D (SEQ ID NO:6), U+D (SEQ ID NO:7), M+D1 (SEQ ID NO:8), and M+D2 (SEQ ID NO:9). More generally, the expression control sequence elements contemplated include bacterial or phage promoter sequences and functional variants thereof, whether natural or artificial; operator/repressor systems; and the lacIq gene (which confers tight regulation of the lac operator by blocking transcription of down-stream (i.e., 3′) sequences).

The lac operator sequences contemplated for use in vectors and plasmids of the instant invention comprise (or alternately consist of) the entire lac operator sequence represented by the sequence 5′ AATTGTGAGCGGATAACAATTTCACACA 3′ (SEQ ID NO:10), or a portion thereof that retains at least partial activity, as described in U.S. Pat. No. 6,194,168. Activity is routinely determined using techniques well known in the art to measure the relative repressability of a promoter sequence in the absence of an inducer, such as IPTG. This is done by comparing the relative amounts of protein expressed from expression control sequences comprising portions of the lac operator sequence and full-length lac operator sequence. The partial operator sequence is measured relative to the full-length lac operator sequence (e.g., SEQ ID NO:10). In one embodiment, partial activity for the purposes of the present invention means activity reduced by no more than 100 fold relative to the full-length sequence. In alternative embodiments, partial activity for the purpose of the present invention means activity reduced by no more than 75, 50, 25, 20, 15, and 10 fold, relative to the full-length lac operator sequence. In a preferred embodiment, the activity of a partial operator sequence is reduced by no more than 10 fold relative to the activity of the full-length sequence.

In many embodiments, one or more S-D sequences of the invention are used in a vector comprising a T5 phage promoter sequence and two lac operator sequences wherein at least a portion of the full-length lac operator sequence (SEQ ID NO:10) is located within the spacer region between −12 and −30 of the expression control sequences described in U.S. Pat. No. 6,194,168. In particular embodiments, the operator sequence comprises (or alternately consists of) at least the sequence 5′-GTGAGCGGATAACAAT-3′ (SEQ ID NO:11).

The previously mentioned lac-operator sequences are negatively regulated by the lac-repressor. The corresponding repressor gene can be introduced into the host cell in a vector or through integration into the chromosome of a bacterium by known methods, such as by integration of the lacIq gene. See, e.g., Miller et al, supra; Calos, (1978) Nature 274:762-765. The vector encoding the repressor molecule may be the same vector that contains the expression control sequences and a gene or polynucleotide of interest or may be a separate vector.

The S-D sequences of the invention can routinely be inserted using procedures known in the art into any suitable expression vector that can replicate in gram-negative and/or gram-positive bacteria. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, N.Y. 2nd ed. 1989); Ausubel et al., Current Protocols in Molecular Biology (Green Pub. Assoc. and Wiley Intersciences, N.Y.). Suitable vectors and plasmids can be constructed from segments of chromosomal, non-chromosomal and synthetic DNA sequences, such as various known plasmid and phage DNAs. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, N.Y. 2nd ed. 1989). Especially suitable vectors include plasmids of the pDS family. See Bujard et al, (1987) Methods in Enzymology, 155:416-4333. Additional examples of preferred suitable plasmids include pBR322 and pBluescript (Stratagene, La Jolla, Calif.) based plasmids. Still additional examples of preferred suitable plasmids include pUC-based vectors, including pUC18 and pUC19 (New England Biolabs, Beverly, Mass.) and pREP4 (Qiagen Inc., Chatsworth, Calif.). Portions of vectors and plasmids encoding desired functions may also be combined to form new vectors with desired characteristics. For example, the origin of replication of pUC19 may be recombined with the kanamycin resistance gene of pREP4 to create a new vector with both desired characteristics.

Preferably, vectors and plasmids comprising one or more S-D sequences of the invention also contain sequences that allow replication of the plasmid to high copy number in the host bacterium of choice. Additionally, vector or plasmid embodiments of the invention that comprise expression control sequences may further comprise a multiple cloning site immediately downstream of the expression control sequences and the S-D sequence.

Vectors and plasmids comprising one or more S-D sequences of the invention may further comprise genes conferring antibiotic resistance. Preferred genes are those conferring resistance to ampicillin, chloramphenicol, and tetracycline. Especially preferred genes are those conferring resistance to kanamycin.

The optimized S-D ribosomal binding site of the invention can also be inserted into the chromosome of gram-negative and gram-positive bacterial cells using techniques known in the art. In this case, selection agents such as antibiotics, which are generally required when working with vectors, can be dispensed with.

Proteins of interest that can be expressed using the S-D sequences, vectors, and host cells of the invention include prokaryotic, eukaryotic, viral, or artificial proteins. Such proteins include, but are not limited to: enzymes; hormones; proteins having immunoregulatory, antiviral or antitumor activity; antibodies and fragments thereof (e.g., Fab, F(ab), F(ab)₂, single-chain Fv, disulfide-linked Fv); or antigens. In preferred embodiments, the protein to be expressed is B. anthracis protective antigen (PA), mutated protective antigens (mPAs) (See, e.g., Sellman et al, JBC 276(11):8371-8376 (2001)), TL3, or TL6. Any effective signal sequence may be used in combination with the gene or polynucleotide of interest. In a preferred embodiment, the ETB signal sequence is used to enhance the expression of soluble protein.

The S-D sequences of the present invention provide for increased efficiency of protein expression in prokaryotic systems. Efficient expression means that the level of protein expression to be expected when using the S-D sequences of the instant invention is generally higher than levels previously reported in the art. In preferred embodiments, the resultant expressed protein can be highly purified to levels greater than 90% purity by RF-HPLC. Particularly preferred purity levels include 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, and near 100% purity, all of which are encompassed by the instant invention. It is expressly contemplated by the invention that the addition of one or more S-D sequences of the invention into any prokaryotic-based expression system, including and in addition to E. coli expression systems, will result in increased and more efficient protein expression.

The present invention also relates to methods of using the S-D sequences, vectors, plasmids, and host cells of the invention to produce proteins and fragments thereof. In one embodiment of the invention, a desired protein is produced by a method comprising:

(a) transforming a bacterium with a vector in which a polynucleotide encoding a desired protein is operably linked to a S-D sequence of the invention;

(b) culturing the transformed bacterium under suitable growth conditions; and

(c) isolating the desired protein from the culture.

In another embodiment of the invention, a desired protein is produced by a method comprising:

(a) inserting a S-D sequence of the invention and an expression control sequence into the chromosome of a suitable bacterium, wherein the S-D sequence and expression control sequence are each operably linked to a polynucleotide encoding a desired protein;

(b) cultivating the bacterium under suitable growth conditions; and

(c) isolating the desired protein from the culture.

The selection of a suitable host organism is determined by various factors that are well known in the art. Factors to be considered include, for example, compatibility with the selected vector, toxicity of the expression product, expression characteristics, necessary biological safety precautions and costs.

Suitable host organisms include, but are not limited to, gram-negative and gram-positive bacteria, such as E. coli, B. subtilis, S. aureus, and S. typhimurium strains. Preferred E. coli strains include DH5a (Gibco-BRL, Gaithersburg, Md.), XL-1 Blue (Stratagene), and W3110 (ATCC No. 27325). Other E. coli strains that can be used according to the present invention include other generally available strains such as E. coli 294 (ATCC No. 31446), E. coli RR1 (ATCC No. 31343) and M15.

EXAMPLES

The examples which follow are set forth to aid in understanding the invention but are not intended to, and should not be construed to, limit the scope of the invention in any way. The examples do not include detailed descriptions for conventional methods employed in the art, such as for the construction of vectors, the insertion of genes encoding polypeptides of interest into such vectors, or the introduction of the resulting plasmids into bacterial hosts. Such methods are described in numerous publications and can be carried out using recombinant DNA technology methods which are well known in the art. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, N.Y. 2nd ed. 1989); Ausubel et al., Current Protocols in Molecular Biology (Green Pub. Assoc. and Wiley Intersciences, N.Y.).

Example 1 pHE6 Design

The S-D sequence used in pHE6 (SEQ ID NO:2) was based on the S-D sequence of the pHE4 expression vector (SEQ ID NO:17) (See U.S. Pat. No. 6,194,168), with three base pair changes made as indicated in FIG. 1. Additionally, the pHE6 plasmid encodes the aminoglycoside phosphotransferase protein (conferring kanamycin resistance), the lacIq repressor, and includes a ColE1 replicon. Construction of the pHE4 plasmid upon which the pHE6 plasmid is based is described in U.S. Pat. No. 6,194,168.

Example 2 Method of Making and Purifying PA in Escherichia coli K-12

Using the following method, a post-purification final yield of soluble PA greater than 2 g from 1 kg of E. coli cell paste (approximately 150 mg/L) can be obtained from either shake flasks or bioreactors. See FIG. 4. The purity of such soluble PA, as judged by RP-HPLC analysis, is greater than 96-98%.

The bacterial host strain used for the production of recombinant wild-type PA from a recombinant plasmid DNA molecule is an E. coli K-12 derived strain. To express protein from the expression vectors, E. coli cells were transformed with the expression vectors and grown overnight (O/N) at 30° C. in 4 L shaker flasks containing 1 L Luria broth medium supplemented with kanamycin. The cultures were started at optical density 600λ (O.D.⁶⁰⁰) of 0.1. IPTG was added to a final concentration of 1 mM when the culture reached an O.D.⁶⁰⁰ of between 0.4 and 0.6. IPTG induced cultures were grown for an additional 3 hours. Cells were then harvested using methods known in the art, and the level of protein was detected using Western blot analysis. Soluble PA was then extracted from the periplasm and clarified by conventional means. The clarified supernatant was then purified using a Q Sepharose HP column (Amersham), concentrated, and further purified using a Biogel Hydroxyapatite HP column (BioRAD). Using the expression control sequence M+D1 (SEQ ID NO:8), high levels of repression in the absence of IPTG, and high levels of induced expression in the presence of IPTG were obtained.

Deposit of Microorganisms

Plasmid pHE6 was deposited with the American Type Culture Collection, 10801 University Boulevard, Manassas, Va. 20110-2209 on Jun. 20, 2002 and was given Accession No. PTA-4474. This culture has been accepted for deposit under the provisions of the Budapest Treaty on the International Recognition of Microorganisms for the Purposes of Patent Proceedings.

The disclosures of all publications (including patents, patent applications, journal articles, laboratory manuals, books, or other documents) cited herein are hereby incorporated by reference in their entireties.

The present invention is not to be limited in scope by the specific embodiments described herein, which are intended as illustrations of individual aspects of the invention. Functionally equivalent methods and components are within the scope of the invention, in addition to those shown and described herein and will become apparent to those skilled in the art from the foregoing description and accompanying drawings. Such modifications are intended to fall within the scope of the appended claims. 

1. An isolated polynucleotide comprising a Shine-Dalgarno sequence selected from the group consisting of: (a) SEQ ID NO:2; (b) polynucleotides 4-13 of SEQ ID NO:2; and (c) SEQ ID NO:18.
 2. The isolated polynucleotide of claim 1 wherein the Shine-Dalgarno sequence is (a).
 3. The isolated polynucleotide of claim 1 wherein the Shine-Dalgarno sequence is (b).
 4. The isolated polynucleotide of claim 1 wherein the Shine-Dalgarno sequence is (c).
 5. A vector comprising a Shine-Dalgarno sequence selected from the group consisting of: (a) SEQ ID NO:2; (b) polynucleotides 4-13 of SEQ ID NO:2; and (c) SEQ ID NO:18.
 6. The vector of claim 5 wherein the Shine-Dalgarno sequence is (a).
 7. The vector of claim 5 wherein the Shine-Dalgarno sequence is (b).
 8. The vector of claim 5 wherein the Shine-Dalgarno sequence is (c).
 9. The vector of claim 5, wherein said Shine-Dalgarno sequence is operably associated with a polynucleotide encoding a protein or fragment thereof.
 10. The vector of claim 9, wherein said polynucleotide encodes SEQ ID NO:4.
 11. The vector of claim 9, wherein said polynucleotide is operably associated with an expression control sequence.
 12. A method of producing a vector comprising inserting the Shine-Dalgarno sequence of claim 1 into a vector.
 13. A method of producing a host cell comprising transducing, transforming or transfecting a host cell with the vector of claim
 5. 14. A recombinant host cell comprising the Shine-Dalgarno sequence of claim
 1. 15. A recombinant host cell comprising the vector of claim
 5. 16. A recombinant host cell comprising the vector of claim
 9. 17. A method of producing a protein, comprising: (a) culturing the host cell of claim 16 under conditions suitable to produce the protein or fragment thereof; and (b) recovering the protein or fragment thereof from the cell culture.
 18. The method of claim 17, wherein said polynucleotide encodes SEQ ID NO:4. 